Sometimes two dimensional visuals are not enough. There is a lot more to the data that can be used to contextualize latent patterns. Often times, many analysts tend to think in two-dimensions – like scatter plots. But there’s more to it. Let’s say we were provided a nice clean set of data that contains the following:
What can you do with that data? Well, turns out that that these quantities are related. [3 lines of description go here]
How did we get to this?
#Set working directory
setwd("/Users/sigmamonstr/Google Drive/DOC/0_Project Tracking/Commerce Academy/Storytelling_with_R/ACS_14")
#Load in data
load("base_file.Rda")
head(data)
## state_fips region region_name id id2
## 1 01 3 South Region 0500000US01117 1117
## 2 01 3 South Region 0500000US01115 1115
## 3 01 3 South Region 0500000US01057 1057
## 4 01 3 South Region 0500000US01129 1129
## 5 01 3 South Region 0500000US01049 1049
## 6 01 3 South Region 0500000US01055 1055
## geography pct_poverty emp_status households hs_grad
## 1 Shelby County, Alabama 8.6 6.2 74790 91.3
## 2 St. Clair County, Alabama 16.1 9.5 31673 82.4
## 3 Fayette County, Alabama 20.8 11.3 6967 76.1
## 4 Washington County, Alabama 15.9 19.8 6218 82.0
## 5 DeKalb County, Alabama 20.1 9.5 24743 73.2
## 6 Etowah County, Alabama 19.6 10.7 40001 82.1
#Load in Threejs library
library(threejs)
We can see that there are direct relationships between unemployment, poverty and education attainment. But there isn’t much detail and the graphs aren’t pretty.
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad)
Let’s stylize the plots. First let’s name the axes with axisLabels, which accepts a vector of axis names. The order matters and is as follows: x-axis, z-axis, y-axis
#Note that axis Labels should follow this order= c(x, z, y)
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,
axisLabels=c("unemployment","hs degree or above","poverty rate"))
Now let’s change the rendering engine to give more depth to the plot. We do so by changing renderer = “canvas”. This just tells R threejs to use a different package to render the points
#Depth using render
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,
axisLabels=c("unemployment","hs degree or above","poverty rate"),
renderer="canvas")
Now, let’s set the color of the points, resize the points, and flip the y axis so it’s ascending from the origin. To do so, we: - set col = “slategrey” - set flip.y = FALSE - set size = 0.5
#Point size, color, don't flip y axis
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,
axisLabels=c("unemployment","hs degree or above","poverty rate"),
renderer="canvas", flip.y=FALSE, col="slategrey",
size=0.5)
Ultimately, we want to find more patterns. By using color, we can group regions by color. We can see some regions are worse off than others. But which? Turns out there are 4 regions:
unique(data$region_name)
## [1] South Region West Region Northeast Region Midwest Region
## Levels: Midwest Region Northeast Region South Region West Region
unique(data$region)
## [1] 3 4 1 2
First, let’s set each region to a different color by first creating a new variable for colors data$colors, then assign a hexcode to each region.
#Set up colors by
data$colors <- ""
data$colors[data$region==1] <- "#011efe0"
data$colors[data$region==2] <- "#0bff01"
data$colors[data$region==3] <- "#fe00f6"
data$colors[data$region==4] <- "#fdfe02"
Now, let’s set col= data$colors so that R knows which color corresponds to each of the 3000 points.
data <- data[order(data$region),]
#Grouped patterns
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,
axisLabels=c("unemployment","hs degree or above","poverty rate"),
col=data$colors, flip.y=FALSE,
renderer="canvas",
size=0.5)
It’s a bit annoying to look at the chart without knowing which point corresponds to which county. Let’s add labels for each point that show up upon mousing over.
#add labels to points
scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,
axisLabels=c("unemployment","hs degree or above","poverty rate"),
col=data$colors,
labels=paste(data$region_name,": ",data$geography),
size=0.5,
renderer="canvas")
In short, we can tell the following key insights from this graph.
Sometimes graphs don’t get the point across. Maps, while over used, can provide some better indication of patterns.
Based on our 3-d graphs, we could see clustering of regions’s economic performance. We can see the mess of points more clearly on a map. Observations:
## OGR data source with driver: ESRI Shapefile
## Source: "cb_2014_us_county_20m.shp", layer: "cb_2014_us_county_20m"
## with 3220 features
## It has 9 fields
We can use the leaflet library to bring a geographic spin to the data. To initiate a map, we only need to open the leaflet library, then run the following:
library(leaflet)
leaflet()
You’ll see that the map is blank with a zoom control panel on the upper left. That’s because the map doesn’t have data in it. There are dozens on free layers we can use:
leaflet() %>%
addProviderTiles("Stamen.Toner")
leaflet() %>%
addProviderTiles("CartoDB.Positron")
Now let’s center and zoom in on the contiguous US
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(lng = -98.3, lat = 39.5, zoom = 4)
We now data. Get shapefile. (diagram of shapes goes here)
shape_direct <- function(url, shp) {
library(rgdal)
temp = tempfile()
download.file(url, temp) ##download the URL taret to the temp file
unzip(temp,exdir=getwd()) ##unzip that file
return(readOGR(paste(shp,".shp",sep=""),shp))
}
shp <- shape_direct(url="http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_county_20m.zip",
shp= "cb_2014_us_county_20m")
## OGR data source with driver: ESRI Shapefile
## Source: "cb_2014_us_county_20m.shp", layer: "cb_2014_us_county_20m"
## with 3220 features
## It has 9 fields
## Warning in readOGR(paste(shp, ".shp", sep = ""), shp): Z-dimension
## discarded
Add shapefile to code
leaflet(data=shp) %>%
addProviderTiles("CartoDB.Positron") %>%
setView(lng = -98.3, lat = 39.5, zoom = 4) %>%
addPolygons(fillColor = "blue",
fillOpacity = 0.8,
color = "white",
weight = 0.5)
The shapefile on its own doesn’t have the data from the scatter chart portion. We need to join the data.
data$GEOID <- str_pad(as.character(data$id2), 5, pad = "0")
shp@data$GEOID <- as.character(shp@data$GEOID)
shp <- merge(shp,data,id="GEOID")
pal <- colorQuantile("YlGn", NULL, n = 30)
state_popup <- paste0("<strong>County: </strong>",
shp@data$geography,
"<br><strong>Poverty Rate (%): </strong>",
shp@data$pct_poverty)
leaflet(data = shp) %>%
addProviderTiles("CartoDB.Positron") %>%
setView(lng = -98.3, lat = 39.5, zoom = 4) %>%
addPolygons(fillColor = ~pal(pct_poverty),
fillOpacity = 0.8,
color = "#BDBDC3",
weight = 0.1,
popup = state_popup)